Positive feedbacks and alternative stable states in forest leaf types

The emergence of alternative stable states in forest systems has significant implications for the functioning and structure of the terrestrial biosphere, yet empirical evidence remains scarce. Here, we combine global forest biodiversity observations and simulations to test for alternative stable states in the presence of evergreen and deciduous forest types. We reveal a bimodal distribution of forest leaf types across temperate regions of the Northern Hemisphere that cannot be explained by the environment alone, suggesting signatures of alternative forest states. Moreover, we empirically demonstrate the existence of positive feedbacks in tree growth, recruitment and mortality, with trees having 4–43% higher growth rates, 14–17% higher survival rates and 4–7 times higher recruitment rates when they are surrounded by trees of their own leaf type. Simulations show that the observed positive feedbacks are necessary and sufficient to generate alternative forest states, which also lead to dependency on history (hysteresis) during ecosystem transition from evergreen to deciduous forests and vice versa. We identify hotspots of bistable forest types in evergreen-deciduous ecotones, which are likely driven by soil-related positive feedbacks. These findings are integral to predicting the distribution of forest biomes, and aid to our understanding of biodiversity, carbon turnover, and terrestrial climate feedbacks.

To evaluate model uncertainty from the second partition, we implemented a stratified bootstrapping procedure 4,5 .With 100 bootstrap iterations, we sampled the training data with replacement, using biome as the stratification criterion to proportionally reflect the major bioclimatic zones in each of the 100 bootstrap samples.The size of each sampling dataset corresponded to the size of the original dataset.As a measure of prediction uncertainty, we calculated the standard deviation of predictions for each pixel across the 100 bootstrap layers.
To evaluate how well our training dataset represents the full multivariate environmental covariate space, we performed a principal-component-analysis-based approach following van den Hoogen et al. (2021).For the PCA-based approach, we projected the covariate composite into the same space using the standardized values and eigenvectors from the principal component analysis of the training data.We created the convex hulls for each of the bivariate combinations from the top principal components and classified whether each pixel falls in or outside each of these convex hulls (Fig. S9B).We used 22 principal components with 68 combinations for all covariates for the sampling dataset.To provide validation for this method, we applied two other methods for interpolation vs. extrapolation analysis.In the first method, we created convex hulls using only the leading (two, three or four) principal components (we decided on a maximum number of four principal components due to computational limitation).We then checked whether each pixel fell in or outside each of these high dimensional convex hulls (Fig. S31).In the second method, for each pixel, we computed the percentage of all 62 covariates with values falling into the range of our training data (Fig. S9A).All methods agree high extrapolation risks in tropical regions.

Spatial leave-one-out cross validation.
To account for the potential effect of spatial autocorrelation in model residuals from the random forest model (based on the spatial clustering procedure), we ran spatially buffered leave-one-out cross validation (SLOO-CV) for a series of buffer radii from 0 km to 1,000 km.In SLOO-CV, each observation is predicted based on a model that includes all data outside the respective buffer radius, resulting in ~15,000 separate random forest models for each buffer radius.Model performance was evaluated based on R 2 and RMSE.We also plotted semi-variograms for model residuals of both random cross validation and SLOO-CV (Fig. S12).To provide a reference for the full model with 62 covariates, we trained a purely spatial model (null model) using only latitude and longitude to predict BI.We then applied the same SLOO-CV procedure on the null model.Supplementary Note 4: Determinant analysis: extended analysis.
To further confirm the results as in Fig. 5, we implemented a similar procedure but using environmental principal components as predictors.We implemented principal component analysis on three groups of abiotic environmental covariates, namely 30 climatic, 10 soil, 12 topographic variables, and selected the first 3 principal components for each group, resulting in a total of 9 predictors (PC1 climate , PC2 climate , PC3 climate , PC1 soil , etc.).The first random forest model fit the forest bimodality index as a function of the 9 principal components at the global scale.The second set of models fit plot-level relative evergreen abundance as a function of the 9 principal components for all plots within forest grids with bimodal distribution (namely BI in -0.22 ~ 0.22), evergreen-dominated distribution (BI > 0.22) or deciduous-dominated distribution (BI < -0.22).We then calculated the permutation importance of each variable used in the random forest models (Fig. S11).

Supplementary Note 5:
Testing the effects of forest successional status and monoculture.
Several forest plots incorporated into our study could potentially be in a successional phase.Consequently, their present phenological type composition may be unstable and susceptible to eventual replacement by other phenological types.This factor introduces potential bias in our analysis results as these transitory forest plots do not represent the ultimate stable states.In addition, during data cleaning for our primary analysis, we excluded monoculture plots from the GFBi and FunDivEUROPE datasets to account for human management effects.Nevertheless, this approach might inadvertently exclude some valid plots as certain natural forests may also exhibit monoculture characteristics.
To address these concerns, we conducted an additional analysis similar to the methods illustrated in Fig. 1 and Fig. 2, employing late successional phase forest data and retaining monoculture plots.Existing research suggests that the Diameter at Breast Height (DBH) distribution within a plot can serve as an indicator of both the successional stage and the age of the forest 6,7 .Given the unavailability of forest age data for all our plots, we utilized the plot-mean DBH as a proxy for the successional stage and the age of the forest.We initially computed the plot-mean DBH for all GFBi plots, from which we derived the 0.75 quantile of plot-mean DBH, approximately 25 cm.We then filtered the GFBi, FIA, and FunDivEUROPE datasets to retain only those plots where the plot-mean DBH exceeded 25 cm, a criterion we used to define late-successional (and thus, stable) plots.All monoculture plots within these three datasets were retained for analysis.Subsequently, we replicated the bimodality testing across all three datasets, and the demographic analysis was conducted using the FIA dataset.The outcomes (Fig. S20) demonstrated overall consistency with our primary analysis (Fig. 1-2).Supplementary Note 6: Recruitment analysis with correction for seed source.
The pattern observed in Fig. 2C, specifically the higher recruitment rates amongst trees surrounded by their own phenological types, may be largely attributable to the availability of seed sources.For instance, a dominant species within a plot is likely to establish a substantial seed source and, as a consequence, have a higher probability of successful recruitment than other species.Given that a dominant species can either be evergreen or deciduous, this recruitment advantage might contribute to the observed trend of higher recruitment rates for evergreen trees in evergreen-dominated forests, and similarly for deciduous trees in deciduous forests.While the seed source indeed contributes to the positive feedback within phenological types, we sought to evaluate if this recruitment feedback persists even when the seed source advantage of the dominant species is accounted for.
In an attempt to control for the impact of seed source, we excluded data of the dominant species, identified as those with the most substantial number of individuals in each FIA plot.Furthermore, we retained only those forest plots where the relative evergreen abundance ranged between 0.1-0.9,ensuring that both evergreen and deciduous trees established local seed sources within each plot.Following this, we replicated the recruitment modeling illustrated in Materials and Methods (Demographic models and simulations: Modelling tree recruitment).The results (Fig. S21) exhibited patterns consistent with our primary findings (Fig. 2C).
We must acknowledge that it is challenging to fully disentangle seed availability from other feedback mechanisms, such as plant-soil interactions.While seed availability does contribute to the feedback loop, where evergreens foster more evergreens, our study reveals a bimodality consistent with bifurcation (as depicted in Figures 1-3) that cannot be explained solely by climate or soil characteristics.Instead, this bimodality is dependent on initial conditions, exhibiting hysteresis (Fig. 3E-F).The presence of a strong hysteresis effect implies that a simple recruitment mechanism based solely on seed availability may not be sufficient to generate such a potent feedback loop.This finding and our extended analysis here (Fig. S21) suggest that other factors, beyond seed availability, are likely involved in shaping and sustaining the observed bimodality and hysteresis.

Supplementary Note 7:
Validating random forest analysis using point-level soil data.
The Soil Grids dataset, in order to create global soil feature layers, employs machine learning models guided by comprehensive, spatially-explicit data on various climatic factors.This method of soil information interpolation using climatic data potentially leads to an overemphasis on the covariance between climate and soil layers, while diminishing the microscale heterogeneity of soil features.In order to examine whether this potential limitation impacts our findings, we incorporated point-level soil measurements from the World Soil Information Service (WOSIS) dataset 8 .This dataset included soil clay content, silt content, pH, sand content, nitrogen density, and coarse volumetric fraction.In order to spatially align this data with the exhaustive GFBi dataset, which comprises over 1.1 million plots globally, we chose the closest soil observation within a radius of either 250 m or 1000 m from each forest plot's center.This yielded a spatial correspondence between soil measurements and forest plots in 182 instances for the 250 m radius and in 2,346 instances for the 1000 m radius (Fig. S22A).
For the purpose of validating the consistency between results obtained from global soil layers and point-level soil observations, we executed a supplementary random forest analysis using the alignment with both the 250 m radius and the 1000 m radius.The outcomes of these analyses confirm that: i) model predictions remain substantially unchanged when point observations substitute global soil layers (with a 97% similarity in predictions of 10-fold cross-validation, as depicted in Fig. S23B & S24B); ii) a high degree of agreement exists between global layers and point observations for the majority of soil variables (Fig. S22); and iii) the relative importance of soil features in phenological type variation amongst differing cluster types is generally maintained when point observations are used in lieu of soil layers as predictors (Fig. S23A, C & S24A, C).These analyses further accentuate the critical influence of soil features in shaping global forest phenological variations.Supplementary Note 8: Spatial and environmental analysis for forest plots in bimodal clusters.
As outlined in the main body of the text, we segmented the GFBi dataset using a 10 arc-min (20km) grid-based 'fishing net' approach to establish spatial clusters (totaling 14,931), and computed a cluster-level Bimodality Index (BI) based on the forest plot information within each cluster.Subsequently, we classified these clusters into bimodal clusters (BI = -0.22~0.22),evergreen-dominated clusters (BI > 0.22), and deciduous-dominated clusters (BI < -0.22).The existence of bimodal clusters (comprising 12% of all clusters) supports the hypothesis that stable evergreen and deciduous forests can coexist in close proximity within each 20km x 20km grid cell, forming distinct patches via positive feedback mechanisms, rather than environmental filtering.This assertion can be further substantiated by an explicit exploration of the spatial distribution and environmental variation of forest plots within each bimodal cluster.
Comparison between spatial and environmental grouping.
To critically assess the consistency of the pattern, i.e., forests forming spatial patches of similar phenological type in environmentally alike regions across all bimodal clusters, we conducted an exhaustive multi-step analysis.This analysis aimed to: 1) confirm whether evergreen and deciduous trees segregate into distinct spatial patches within bimodal clusters and 2) investigate if environmental factors fully account for this patch formation.
To attain these objectives, we first classified each forest plot within every bimodal cluster into one of three categoriesevergreen, mixed, or deciduous-based on plot-level relative Evergreen Vegetation Index (relEV).Subsequently, we compared these predefined categories with five different groupings, which were established based on plot spatial coordinates, environmental conditions (inclusive of both climate and soil), randomly shuffled environmental conditions (serving as a null group), climatic conditions, and soil information.The concurrence between predefined categories and spatial location groupings would indicate distinct spatial patch formations of evergreen and deciduous forests.Similarly, strong alignment between predefined categories and environmental groupings would suggest environmental factors influencing patchy distributions.
Step 2: Implement the K-means method 9 to categorize forest plots into three spatial classes within each bimodal cluster, based on their latitude and longitude coordinates.This method groups geographically proximal plots into one class.
Step 3: Execute a Principal Component Analysis (PCA) on 62 environmental covariates for each cluster (refer to Table S2 for details), followed by K-means classification into three environmental groups based on the leading five environmental Principal Components (PCs).This method groups environmentally similar plots into one class.
Step 4: After the PCA in Step 3, randomize the order of the top ten environmental PCs across all plots and categorize the plots into three environmental classes using the K-means method, but this time based on randomized PCs.
Step 5: Conduct a PCA on 26 climatic covariates for each cluster, followed by K-means categorization into three climatic groups based on the leading five climatic PCs.This method groups climatically similar plots into one class.
Step 6: Perform a PCA on 9 soil covariates for each cluster, followed by K-means categorization into three soil groups based on the leading five soil PCs.This method groups plots with similar soil conditions into one class.
Step 7: To compare the five types of groupings (spatial, environmental, random, climate, and soil) with our predefined categories based on forest phenological composition, we employed the Adjusted Rand Index (ARI), a standard metric for comparing two partitions 10 .ARI ranges from 0 to 1, with higher values indicating better alignment between two partitions.
We repeated these steps for each bimodal cluster, yielding five ARIs per cluster.The ARI distributions were then visualized for further interpretation.Our results showed that spatial grouping yielded the highest ARI, significantly surpassing values from the other four methods, followed by soil grouping (Fig. S26).This finding strongly suggests that different forest phenological types form distinct spatial patches.The comparatively higher ARI for environmental grouping than the random group implies that environmental heterogeneity partially impacts the distribution of phenological types but does not entirely account for patch formation, as the overall match remains poor (Fig. S26).Moreover, the elevated ARI for soil grouping over climatic grouping suggests that soil variance better explain the patch formation than climate.

Visual illustration with examples.
To offer a clear illustration of the spatial arrangement and environmental variation of plots in bimodal clusters, we randomly selected three such clusters as examples (Fig. S25A, two located in the US and one in Europe).For each cluster, we depicted the coordinates of forest plots within that cluster (Fig. S25B-D), and presented the histogram of plot-level relative evergreen abundance, showcasing a bimodal distribution in all three clusters (Fig. S25H-J).Additionally, we conducted a principal component analysis (PCA) on 62 environmental covariates for each cluster (Refer to Table.S2 for a detailed description), and plotted the locations of forest plots on the environmental space delineated by the leading two principal components (Fig. S25E-G).The results indicated that evergreen forests are spatially closer to other evergreens, and deciduous forests are in closer proximity to other deciduous forests (Fig. S25B-D).However, no clear clustering of evergreen or deciduous forests was evident in the PCA space, implying these forests inhabit similar environmental conditions (Fig. S25E-G).
Comparison of environmental variance among three types of clusters.
Should climate variance primarily drive bimodality, we would expect to observe heightened climatic variance in bimodal clusters compared to deciduous-dominated or evergreen-dominated clusters.Conversely, if plant-soil interactions mainly influence bimodal patterns, higher soil variance would be expected in bimodal clusters relative to the other two types of clusters.We conducted a comparative analysis to test these hypotheses, evaluating the variance of both climatic and soil covariates, such as temperature, precipitation, and soil pH, among all plots within each cluster of the three cluster types.Our results indicate a lower climatic variance within bimodal clusters compared to both deciduous-dominated and evergreendominated clusters.Contrarily, we observed higher soil variance within bimodal clusters relative to the other cluster types (see Fig. S27).This evidence suggests that bimodal patterns are not primarily driven by increased climatic variance; rather, they are more likely a result of plant-soil interactions that produce heightened soil variance within bimodal clusters.
In summary, these detailed analyses support the assertion that forests form spatial patches of their own phenological type in environmentally similar areas within bimodal clusters.This observed pattern is likely the outcome of positive feedback mechanisms (e.g., plant-soil interactions), given the inadequacy of environmental filtering to explain such formations.Further, this interpretation is bolstered by evidence of demographic positive feedbacks presented in the main text (Fig. 2).Supplementary Note 9: Comparison between random forest model predictions and a remote sensing product.
To compare our model predictions with 10 m resolution remote sensing data for Europe (https://land.copernicus.eu/en/products/high-resolution-layer-dominant-leaf-type/dominant-leaf-type-2018#download)for validation, we implemented the following procedure.Given this dataset only includes information of coniferous and broadleaf forests, we first treated coniferous as evergreen, and broadleaf as deciduous.We also approximated each pixel as one canopy.We then computed the percentage of coniferous pixels across all forest pixels within each 100m*100m grid.
Next, we computed BI based all 100m resolution grid within each 20km*20km grid.The scatter plot between remote sensing observations and predictions from spatial random forest model shows an R 2 of 0.35, which is quite good (Fig. S32A).And the observations and predictions match well in bimodal regions (Bis in -0.22 -0.22) and evergreen-dominated regions (BIs > 0.22).We can also see that the model predicts more evergreen-dominated plots when remote-sensing showing they are "deciduous-dominated plots".This discrepancy mainly results from our treatment of approximating broadleaf as deciduous, therefore ignoring the presence of broadleaf-evergreen forests.To account for this limitation, we use the 300m resolution forest type map (https://cds.climate.copernicus.eu/cdsapp#!/dataset/10.24381/cds.006f2c9a?tab=overview) to filter the 10m resolution dataset to only retain needleleaf-evergreen, broadleaf-deciduous and mixed forests.We then apply the same comparison procedure as above.The result showed way better agreement between data and model (R 2 =0.44,Fig. S32B).Supplementary Note 10: Testing bimodality when treating leaf shedding as a continuous trait.
We recognize the potential limitations of categorizing species strictly as either evergreen or deciduous, a dichotomy that simplifies the complex spectrum of leaf shedding behaviors across different species.For example, a hypothetical tree that sheds leaves for one month per year is currently treated the same as a tree that sheds them for 11 months.
To test this, we examined whether this bimodality trend is apparent if we treat deciduous as a continuous or categorical trait.In this case we found that, whether deciduousness is treated as a category, or a continuous trait does not affect the bimodal distribution between evergreen and deciduous species.As such, it has no bearing on the overall conclusions.To demonstrate that the actual underlying distribution of leaf shedding, or deciduousness, is bimodal, we now used the most comprehensive species-level dataset on growing season length from Zohner et al. (2017), which includes data for more than 400 species from North America, Europe and Asia 11 .Fig. S33 shows that across all continents, there is a clear bimodal distribution, which was also statistically confirmed by the significant P values of a Hartigan's dip test.
Moreover, it is important to clarify that our study's focus is on regions with pronounced seasonal variations (latitudes greater than 15° N), where the distinction between deciduous and evergreen species is not only clear but also rooted in their genetic makeup.For example, deciduous trees, such as beeches, invariably lose their leaves in response to winter conditions, while evergreens, like spruces, retain their foliage throughout the year.This genetically determined behavior results in predictable patterns of leaf phenology that are consistent across individuals of the same species, despite spatial variations in the length of the leafless season.
Our study's scope is specifically tailored to understand these genetically fixed phenological patterns within a clear seasonal framework.Including additional variables, such as the precise length of the growing season for each species, while potentially enriching in a different context, would not alter the classification of a species as deciduous or evergreen within the studied latitudes.This is because, in the regions we are examining, an evergreen tree's inherent genetic traits ensure its evergreen status, and similarly, a deciduous tree remains deciduous regardless of slight variations in the growing season's length.
Our decision to adhere to this dichotomous classification system is driven by the genetic demarcation between deciduous and evergreen species in these latitudes, coupled with the practical limitations of data availability.Extending our analysis to account for the nuanced length of the growing season across the vast geographic and climatic spectrum we are studying would necessitate a level of detailed phenological data that is currently beyond our reach.Adding information on the precise growing season length of each individual in each plot is simply not possible.Species-level information would not add much to this question, as there is too much geographic variation within species, which is often higher than the variation across species.

Fig. S1 | Overview of the analyses conducted in this study.
We used forest inventory plot data with detailed forest composition information: the GFBi dataset for global scale analysis, FIA data for analysis of mainland US, and FunDivEUROPE data for analysis of Europe.The FIA and FunDivEUROPE have repeated measurements, which we used for demographic analysis.For all three datasets, we removed managed/monoculture plots to control for human impacts.We then calculated plot-level relative evergreen abundance (relEV) and extracted spatial environmental covariates for each forest plot.
In the next step, we conducted a PCA for those spatial environmental covariates and retained the ten leading PCs.With these preprocessed data, we tested the four criteria of alternative stable states described in the introduction.In addition, we used random forest models to map the bimodality in forest leaf phenology types across the global forest extent.We then calculated variable importance to quantify the key factors that drive these biogeographic patterns.The first three analyses can provide evidence for the existence of alternative stable states.The fourth analysis quantifies the global extent of alternative stable states and the key drivers.• Evergreen trees • Deciduous trees The differences between all compared pairs are highly significant (t-test p-value < 0.001)."spatial clustering" approach, for each environmental cluster, we then aggregated plot-level relative evergreen abundance to calculate the BI.To extrapolate the BI across the globe, using the "spatial clustering" and "environmental clustering" approaches, we then trained two random forest models, including the 62 environmental predictors.To determine the optimal grid size of the "fishing net" for spatial clustering approach, we trained a series of random forest models using grid sizes from 0.01 to 2 degree.We then check R 2 (A) and RMSE (B) for each random forest model with a certain grid size.

FIAFig. S2 |
Fig. S2 | Spatial autocorrelation of model residuals.Residual spatial autocorrelation, visualized as semi-variance as a function of spatial distance in (a) residuals of deciduous (DE) tree recruitment modelled with only an intercept, (b) residuals of evergreen (EV) tree recruitment modelled with only an intercept, (c) residuals of deciduous tree recruitment in the full GAM model, and (d) residuals of evergreen tree recruitment in the full GAM model.

Fig. S3 |
Fig. S3 | Anticorrelation between evergreen and deciduous stem density at the continental and global scale.A: The spatial distribution of over 15,431 forest inventory sites from the European NFIs used for the continental analysis.Colors represent the relative abundance of evergreen trees within a plot (red, 100% deciduous; blue, 100% evergreen).B: Histogram of the plot-level evergreen percentage in observed data across the Europe.The results of null model driven by environmental filtering (zero adjusted Poisson distribution) are shown as medians with 2.5% and 97.5% quantiles.C: Spearman's rank correlation coefficient between evergreen abundance and deciduous abundance in the observed data (the red bar) versus the simulated results of null model (the black histogram) with a Poisson assumption for European NFIs.D: Spearman's rank correlation coefficient between evergreen abundance and deciduous abundance in the observed data versus the simulated results of null model with a negative binomial assumption (with overdispersion correction) for European NFIs.E: Spearman's rank correlation coefficient between evergreen abundance and deciduous abundance in the observed data versus the simulated results of the null model with a negative binomial assumption for the US FIA data.F: Spearman's rank correlation coefficient between evergreen abundance and deciduous abundance in the observed data versus the simulated results of the null model with a negative binomial assumption for the GFBi dataset.

Fig. S4 |
Fig. S4 | Difference in survival (A) and recruitment probability (B) of individual tree species growing in evergreen (EV)

Fig. S5 |
Fig. S5 | Subregional analysis of con-phenological neighborhood effects.Con-phenological feedbacks in tree recruitment and survival across 5 ecoregions in eastern US within the FIA dataset.Statistical models were fit within ecoregions that contained at least 1,000 forest inventory plots after filtering (see Methods).

Fig. S6 |
Fig. S6 | Observed positive feedback in con-phenological demographics in Europe.A: Survival probability of an individual deciduous tree (DE, red) or evergreen tree (EV, blue) within a purely evergreen or deciduous forest stand.B: Individual deciduous or evergreen tree growth (stem diameter increment in cm per year) when the surrounding trees are purely evergreen or deciduous.C: Recruitment rates of deciduous or evergreen trees in deciduous or evergreen dominated forest plots.All plotted data are drawn from the 95% CI of the corresponding full model, controlling for environmental conditions and stand structure.

Fig. S7 |
Fig. S7 | Comparison between area-based relEV (weighted by basal area) and individual-based relEV (weighted by number of stems per plots).

Fig. S9 |Fig. S14 |
Fig. S9 | Map for interpolation vs. extrapolation analysis.A: map showing the percentage of predictors in each pixel with values falling into the range of our training dataset.B: interpolation proportion based on the convex hull methods using 66 combinations of randomly selected two PCs.

Fig. S15 |
Fig. S15 | Results of Grid size tuning.To determine the optimal grid size of the "fishing net" for spatial clustering

Fig. S16 |
Fig. S16 | Northern Hemisphere-wide patterns of forest leaf phenology strategies.Map of bimodality in forest leaf phenology strategies based on random forest modelling using the environmental clustering approach.Colors reflect the projected value of the bimodality index (BI), with red representing BIs < -0.22 (deciduous-dominated forest clusters) and blue representing BIs > 0.22 (evergreen-dominated forest clusters), while yellow, green to cyan colors represent BIs from -0.22 -0.22 (forest clusters with bimodal patterns).The predictions were restricted to forest regions above 15 degrees northern latitude, where 98% of the GFBi data are located.

Fig. S21 |
Fig. S21 | Recruitment analysis with seed source correction.Recruitment rates of deciduous (DE, red) or evergreen trees (EV, blue) in deciduous or evergreen dominated forest plots.All plotted data are drawn from the 95% CI of the corresponding full model, controlling for environmental conditions and stand structure.The differences between all compared pairs are highly significant (t-test p-value < 0.001).

Fig. S22 |Fig. S23 |
Fig. S22 | Comparison of soil data between Soil Grid and WoSIS.A: spatial distribution of WoSIS data.Purple points represent the 2346 locations with a match between the point-level WoSIS data and a forest inventory plot in GFBi dataset (see SI Section 7).B-G: Scatter plots showing the correlations of soil variables from the Soil Grids maps and the point-level WoSIS dataset.The correlations were evaluated for six variables, which were also used for random forest modelling: soil sand content (B, mass fraction in %), soil silt content (C, mass fraction in %), clay content (D, mass fraction in %), soil pH (E), soil nitrogen density (F, g/kg) and soil coarse fragments volumetric (G, mass fraction in %).

Fig. S25 |Fig. S26 |
Fig. S25 | Distribution of forest plots in example bimodal clusters.A: locations of three example 20km*20km bimodal clusters (in green).B-D: spatial distribution of forest plots in the corresponding clusters from left to right, with colors indicating plot-level relative evergreen abundance.E-G: locations of forest plots in PCA space spanned by the leading two PCs, for each bimodal cluster from left to right.H-J: histograms of the observed plot-level relative evergreen abundance in each cluster from left to right.

Fig. S27 |
Fig. S27 | Environmental variance comparison among bimodal, deciduous-dominated and evergreen-dominated clusters.Environmental variance within bimodal clusters is shown in green, while environmental variance among plots randomly selected from the entire GFBi dataset is shown in purple.

Fig. S29 |Fig. S30 |Fig. S31 |
Fig. S29 | Scatter-plots showing prediction of the null model (purely spatial) versus observation across buffer raii (0-100km, A-G).The black-dash lines are the fitted linear regression line.While the red solid lines represent the "y=x" line.